{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "perfect-gossip",
   "metadata": {},
   "source": [
    "# Installation\n",
    "\n",
    "Interpret is supported across Windows, Mac and Linux on Python 3.5+ \n",
    "\n",
    "````{tab-set}\n",
    "```{tab-item} pip\n",
    "pip install interpret\n",
    "```\n",
    "```{tab-item} conda\n",
    "conda install -c conda-forge interpret\n",
    "```\n",
    "```{tab-item} source\n",
    "git clone https://github.com/interpretml/interpret.git && cd interpret/scripts && make install\n",
    "```\n",
    "````\n",
    "\n",
    "InterpretML supports training interpretable models (**glassbox**), as well as explaining existing ML pipelines (**blackbox**).\n",
    "Let's walk through an example of each using the UCI adult income classification dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "convenient-karma",
   "metadata": {},
   "source": [
    "# Download and Prepare Data\n",
    "\n",
    "First, we will load the data into a standard pandas dataframe or a numpy array, and create a train / test split. There's no special preprocessing necessary to use your data with InterpretML."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "grand-angle",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "df = pd.read_csv(\n",
    "    \"https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data\",\n",
    "    header=None)\n",
    "df.columns = [\n",
    "    \"Age\", \"WorkClass\", \"fnlwgt\", \"Education\", \"EducationNum\",\n",
    "    \"MaritalStatus\", \"Occupation\", \"Relationship\", \"Race\", \"Gender\",\n",
    "    \"CapitalGain\", \"CapitalLoss\", \"HoursPerWeek\", \"NativeCountry\", \"Income\"\n",
    "]\n",
    "train_cols = df.columns[0:-1]\n",
    "label = df.columns[-1]\n",
    "X = df[train_cols]\n",
    "y = df[label]\n",
    "\n",
    "seed = 42\n",
    "np.random.seed(seed)\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "shaped-calcium",
   "metadata": {},
   "source": [
    "# Train a Glassbox Model\n",
    "\n",
    "**Glassbox** models are designed to be completely interpretable, and often provide similar accuracy to state-of-the-art methods.\n",
    "\n",
    "InterpretML lets you train many of the latest glassbox models with the familiar scikit-learn interface."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "regulated-share",
   "metadata": {},
   "outputs": [],
   "source": [
    "from interpret.glassbox import ExplainableBoostingClassifier\n",
    "ebm = ExplainableBoostingClassifier()\n",
    "ebm.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "qualified-friendship",
   "metadata": {},
   "source": [
    "# Explain the Glassbox\n",
    "\n",
    "Glassbox models can provide explanations on a both global (overall behavior) and local (individual predictions) level.\n",
    "\n",
    "**Global explanations** are useful for understanding what a model finds important, as well as identifying potential flaws in its decision making (i.e. racial bias)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "nutritional-ozone",
   "metadata": {},
   "outputs": [],
   "source": [
    "from interpret import set_visualize_provider\n",
    "from interpret.provider import InlineProvider\n",
    "set_visualize_provider(InlineProvider())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "empty-wisdom",
   "metadata": {},
   "source": [
    "````{margin}\n",
    "```{note}\n",
    "**The inline visualization embedded here is exactly what gets produced in the notebook.**\n",
    "\n",
    "For this global explanation, the initial summary page shows the most important features overall. You can use the dropdown to search, filter, and select individual features to drill down deeper into.\n",
    "\n",
    "Try looking at the \"Age\" feature to see how the probability of high income varies with Age, or the \"Race\" or \"Gender\" features to observe potential bias the model may have learned.\n",
    "```\n",
    "````"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "orange-timing",
   "metadata": {},
   "outputs": [],
   "source": [
    "from interpret import show\n",
    "\n",
    "ebm_global = ebm.explain_global()\n",
    "show(ebm_global)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fjisdlkjsls",
   "metadata": {},
   "source": [
    "<br/>\n",
    "<br/>\n",
    "<br/>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "statewide-venezuela",
   "metadata": {},
   "source": [
    "**Local explanations** show how a single prediction is made. For glassbox models, these explanations are exact -- they perfectly describe how the model made its decision.\n",
    "\n",
    "These explanations are useful for describing to end users which factors were most influential for a prediction."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fallen-clearance",
   "metadata": {},
   "source": [
    "````{margin}\n",
    "```{note}\n",
    "In the local explanation for instance \"2\", the probability of high income was 0.93, largely due to having a high value for the CapitalGains feature.\n",
    "\n",
    "The values shown here are **log-odds** scores from the EBM, which are added and passed through a logistic-link function to get the final prediction, just like logistic regression.\n",
    "```\n",
    "````"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "injured-laundry",
   "metadata": {},
   "outputs": [],
   "source": [
    "ebm_local = ebm.explain_local(X_test[:5], y_test[:5])\n",
    "show(ebm_local, 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "vkeklwkdjslkjs",
   "metadata": {},
   "source": [
    "<br/>\n",
    "<br/>\n",
    "<br/>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "changed-cruise",
   "metadata": {},
   "source": [
    "# Build a Blackbox Pipeline\n",
    "\n",
    "**Blackbox interpretability** methods can extract explanations from any machine learning pipeline. This includes model ensembles, pre-processing steps, and complex models such as deep neural nets.\n",
    "\n",
    "Let's start by training a random forest that is first pre-processed with principal component analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "killing-necessity",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.decomposition import PCA\n",
    "from sklearn.pipeline import Pipeline\n",
    "\n",
    "# We have to transform categorical variables to use sklearn models\n",
    "X_enc = pd.get_dummies(X, prefix_sep='.')\n",
    "feature_names = list(X_enc.columns)\n",
    "y = df[label].apply(lambda x: 0 if x == \" <=50K\" else 1)  # Turning response into 0 and 1\n",
    "X_train, X_test, y_train, y_test = train_test_split(X_enc, y, test_size=0.20, random_state=seed)\n",
    "\n",
    "#Blackbox system can include preprocessing, not just a classifier!\n",
    "pca = PCA()\n",
    "rf = RandomForestClassifier(n_estimators=100, n_jobs=-1, random_state=seed)\n",
    "\n",
    "blackbox_model = Pipeline([('pca', pca), ('rf', rf)])\n",
    "blackbox_model.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "embedded-organic",
   "metadata": {},
   "source": [
    "# Explain the Blackbox\n",
    "\n",
    "All you need for a blackbox interpretability method is a predict function from the target ML pipeline.\n",
    "\n",
    "Blackbox interpretability methods generally work by perturbing input data repeatedly passing it through the pipeline, and observing how the final prediction changes.\n",
    "\n",
    "As a result both global and local explanations are approximate, and may sometimes be inaccurate. Be cautious of the results in high-stakes environments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "after-landscape",
   "metadata": {},
   "outputs": [],
   "source": [
    "from interpret.blackbox import LimeTabular\n",
    "from interpret import show\n",
    "\n",
    "lime = LimeTabular(blackbox_model.predict_proba, X_train, random_state=seed)\n",
    "lime_local = lime.explain_local(X_test[:5], y_test[:5])\n",
    "\n",
    "show(lime_local, 0)"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
